Skip to content

Support auto-allocate VLAN for SubnetConnectionBindingMap#1445

Open
wenqiq wants to merge 4 commits into
vmware-tanzu:mainfrom
wenqiq:topic/wenqi/VLAN-auto-allocation
Open

Support auto-allocate VLAN for SubnetConnectionBindingMap#1445
wenqiq wants to merge 4 commits into
vmware-tanzu:mainfrom
wenqiq:topic/wenqi/VLAN-auto-allocation

Conversation

@wenqiq

@wenqiq wenqiq commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

When spec.vlanTrafficTag is not set, the subnetbinding controller allocates a VLAN from [1, 4094], writes it to spec, and realizes the NSX binding map. Used VLANs are discovered via NSX search on parent Subnet paths so bindings created outside the Supervisor are included.

TestDone:

  1. Ensure the Testbed has the latest nsx-operator deployed with the VLAN auto-allocation feature.
    before deploy the new manager:
0e09e45f585481698f5895b569d57ed2
  1. Create a YAML file test-vlan-auto.yaml containing a parent subnet, two child subnets, and a binding map:
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: Subnet
metadata:
  name: parent-subnet
  namespace: default
spec:
  ipv4SubnetSize: 16
---
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: Subnet
metadata:
  name: child-subnet-1
  namespace: default
spec:
  ipv4SubnetSize: 16
---
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: Subnet
metadata:
  name: child-subnet-2
  namespace: default
spec:
  ipv4SubnetSize: 16
---
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: SubnetConnectionBindingMap
metadata:
  name: binding-auto-1
  namespace: default
spec:
  subnetName: child-subnet-1
  targetSubnetName: parent-subnet
  # Intentionally omit vlanTrafficTag to trigger auto-allocation
a7874d761ca643942e507aa66be222fd
  1. Verify Auto-allocation

Run kubectl apply -f test-vlan-auto.yaml.
Wait for the resources to be ready: kubectl get subnetconnectionbindingmap binding-auto-1.
Check the allocation result: kubectl get subnetconnectionbindingmap binding-auto-1 -o yaml.
Expected Result: spec.vlanTrafficTag is automatically injected (e.g., 0, 100, or another value within the [0, 4094] range).

7a1fe4de34621bfbd7cfde509268004e c75f5a5bfc8dcf83ccedba3fe5dcf779

@wenqiq wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from 018d05f to 3443291 Compare June 2, 2026 11:09
@codecov-commenter

codecov-commenter commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 66.21005% with 74 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.93%. Comparing base (c44e899) to head (929221e).

Files with missing lines Patch % Lines
pkg/nsx/services/subnetbinding/vlan_allocation.go 63.15% 23 Missing and 5 partials ⚠️
...trollers/subnetbinding/subnetbinding_controller.go 61.42% 23 Missing and 4 partials ⚠️
pkg/nsx/util/utils.go 0.00% 10 Missing ⚠️
pkg/nsx/services/vlanpool/vlanpool.go 85.00% 5 Missing and 4 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1445      +/-   ##
==========================================
- Coverage   78.05%   77.93%   -0.13%     
==========================================
  Files         186      188       +2     
  Lines       25171    25382     +211     
==========================================
+ Hits        19648    19782     +134     
- Misses       4204     4267      +63     
- Partials     1319     1333      +14     
Flag Coverage Δ
unit-tests 77.93% <66.21%> (-0.13%) ⬇️
Files with missing lines Coverage Δ
pkg/nsx/services/subnetbinding/builder.go 93.18% <100.00%> (ø)
pkg/nsx/services/subnetbinding/subnetbinding.go 92.07% <100.00%> (ø)
pkg/nsx/services/vlanpool/vlanpool.go 85.00% <85.00%> (ø)
pkg/nsx/util/utils.go 86.31% <0.00%> (-1.86%) ⬇️
...trollers/subnetbinding/subnetbinding_controller.go 73.69% <61.42%> (-3.42%) ⬇️
pkg/nsx/services/subnetbinding/vlan_allocation.go 63.15% <63.15%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wenqiq wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from 3443291 to 6ffa2b2 Compare June 2, 2026 13:12
@wenqiq wenqiq marked this pull request as ready for review June 2, 2026 17:48
@dantingl dantingl requested review from wenyingd and yanjunz97 June 4, 2026 06:27

@dantingl dantingl left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more local test cases that some existing NSX BindingMap already allocate some VLAN ID

Comment thread pkg/controllers/subnetbinding/subnetbinding_controller.go Outdated
Comment thread pkg/nsx/services/vlanpool/vlanpool.go Outdated
}
}

if err := r.patchSpecVlanTrafficTag(ctx, bindingMap, vlan); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, it is not suggested to let the reconciler update SubnetConnectionBindingMap spec, which is possible to break user's intent, e.g., we may not differentiate if this VLAN is set by user or by nsx-operator. Updating spec is different from updating status. So I would prefer to introduce a field in SubnetConnectionBindingMap.status to record the final VLAN, no matter it is from user's intent (spec) or from the auto-calculation of nsx-operator.

Secondly, the current implementation is 1) patch the CR first, and then 2) continue the reconciliation (sync to nsx subnet-connection-binding-map) without a re-queue. It makes the reconciler usage mixed. As we have patched the CR, another update event will be triggered, in which the CR could have all required configurations. So I would prefer to split the "VLAN allocation" and "nsx resource create/update" into two independenty reconciliations, e.g., 1) if VLAN is not specified in one event handling cycle, the reconciler only allocates VLAN, 2) if everything is ready, do the reconciliation to sync to NSX.

Even more, if we must go with the route to patch the CR spec rather than introducing a field in the status, what do you think to use webhook for allocating VLAN?

Personally, I prefer to use the status to show the auto-allcated VLAN rather than patching spec..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing. I've updated the implementation accordingly.

On splitting reconciliation: Agreed. VLAN allocation and NSX sync are now handled in separate reconcile cycles. When vlanTrafficTag is omitted, the reconciler only allocates a VLAN, persists it to the CR spec, and returns. A subsequent reconcile (triggered by the spec update) validates the VLAN and syncs to NSX.

On webhook-based allocation: I don't think a mutating webhook is a good fit here. VLAN allocation depends on dependent Subnets being ready and NSX state (cache/NSX query, pending pool, overlap retry), which fits the controller lifecycle better than admission.

On persisting to spec vs. status: We still persist the auto-allocated VLAN in spec.vlanTrafficTag (The API definition still needs to be confirmed by @jianjuns). To distinguish operator-allocated vs. user-provided VLANs after persistence, we set annotation nsx.vmware.com/auto_allocated_vlan_traffic_tag when writing the spec. This keeps pending VLAN pool commit/release correct in the second reconcile cycle.

Comment thread pkg/controllers/subnetbinding/subnetbinding_controller.go Outdated
func (r *Reconciler) reconcileVlanTrafficTag(ctx context.Context, bindingMap *v1alpha1.SubnetConnectionBindingMap, parentSubnetPaths []string) *errorWithRetry {
if bindingMap.Spec.HasVlanTrafficTag() {
vlan := *bindingMap.Spec.VLANTrafficTag
if err := r.VlanPoolService.ValidateManualVlan(parentSubnetPaths, vlan, string(bindingMap.UID)); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to verify the user input vlan tag?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this vlan tag is allocated by the controller? e.g., in round 1, controller successfully reconciles the CR, then user has made update, the resource enqueue again, then this function may break the controller to do the expected reconciliation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment thread pkg/nsx/services/vlanpool/vlanpool.go Outdated

// Allocate picks an available VLAN. When preferred is valid and unused on the target, it is returned;
// otherwise the smallest free ID from [0, 4094] is used.
func (s *Service) Allocate(parentSubnetPaths []string, excludeCRUID string, preferred int64) (int64, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try a valid tag (not used) from the cache first, then query from NSX only when NSX returns error that the value is already used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment thread pkg/nsx/services/vlanpool/vlanpool.go
@wenqiq wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch 3 times, most recently from 797b822 to e59068d Compare June 9, 2026 07:02
@wenqiq

wenqiq commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Could you add more local test cases that some existing NSX BindingMap already allocate some VLAN ID

Updated the E2E test cases to include this.

@wenqiq wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from 6cf497f to 7f070ee Compare June 9, 2026 11:13
@dantingl dantingl requested a review from jianjuns June 10, 2026 02:49
unique on the target Subnet or SubnetSet. When omitted, the operator auto-allocates a VLAN ID.
format: int64
maximum: 4094
minimum: 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the commit description (it now says [0-4094]) too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns NSX API supports VLAN [0-4094], so do you think we should keep the same range in CRD definition including VLAN 0, but do auto-allocation from VLAN 1?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VLAN 0 maps to the PARENT (VNIC) port. How can it be used for CHILD ports?

return nil
}

func (r *Reconciler) reconcileVlanTrafficTag(ctx context.Context, bindingMap *v1alpha1.SubnetConnectionBindingMap, parentSubnetPaths []string, fromNSX bool) *errorWithRetry {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a VLAN ext Subnet, could we first try using the Subnet's VLAN ID, when it is not used yet?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is no architectural relationship between the VLAN extension Subnet's VLAN ID and the vlanTrafficTag, correct? This aligns with previous comments that mentioned: #1445 (comment) #1445 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While what you said is true, from VLAN network consumer perspective - esp. for those who used to consume VLAN DVPG or SV VDS network directly, but migrate to VPC and VLAN extension - it would be much easier to keep using the same VLAN IDs in the guest.
Think about the major use case of Telco.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Updated.

@wenqiq wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from 7f070ee to a2adbc0 Compare June 11, 2026 04:16
@@ -79,14 +79,13 @@ spec:
vlanTrafficTag:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns could you help share your thoughts regarding how to show the "auto-allocated" vlan in the CR?

We have two options,
option 1: Re-use SubnetConnectionBindingMap CR spec.vlanTrafficTag for the allocated VLAN,
option 2: keep spec.vlanTrafficTag as empty (the same as user input), and introduce a field in status to show the final VLAN configured in NSX.

The current implementation is option1, the advantage is we don't need to update the APIs. In the meanwhile, it is impossible to know whether the VLAN is allocated by system or provided by the user (user input spec is modified). For nsx-operator logic, it may have two request to kube-apiserver : 1) update the CR first and 2) update status with the conditions.

For option 2, the advantage is the original user intent is kept, and nsx-operator only send request to apiserver to "update status". In the meanwhile, we may need to introduce a new field in status to record the final VLAN used in NSX.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wenyingd, I`d like to add a point is that, during the reconcile process we set annotation nsx.vmware.com/auto_allocated_vlan_traffic_tag when writing the spec, this helps distinguish operator-allocated vs. user-provided VLANs. see #1445 (comment)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, K8s API design does not reuse the Spec field to populate system generated values, while NSX Policy API has such cases. Could you also check with @tnqn on this one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, and it appears these two examples are similar to our situation, right? K8s dynamically allocates values for omitted spec fields (e.g., ClusterIP, NodePort) and writes them back to the spec.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but there can also be an argument that Service v1 API is a legacy design pattern. Could we understand the NSX Policy API pattern for such cases to decide?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that can anyway happen since we will continue to support user specified VLAN ID in the Spec. The discussion here is just that when user does not specify a VLAN ID, should system auto-allocated VLAN ID be set in the same Spec field.

@tnqn tnqn Jun 18, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If status.vlanID is introduced, the consumer should use status.vlanID only, which resolves the above problem.

  • When user specify spec.vlanID, if it cannot be realized in NSX, status.vlanID will not be set.
  • When user specify spec.vlanID, if it can be realized in NSX, status.vlanID will be set to spec.vlanID.
  • When user omit spec.vlanID, status.vlanID will be set to the allocated VLAN ID.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean now. However, even without VLAN ID in status, the consumer should check whether the resource is realized from the status before using it?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could but it's more complex, especially if the vlan ID is mutable: consumer needs to know the realized status is for the previous ID or the current ID. There could be solutions anyway even if reusing the field in spec. It's just a recommendation to separate the user input and the actual status to make it simpler.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. When NSX supports VLAN ID auto-allocation and we also expose SubnetConnectionBindingMap in CCI, in your mind how should we define the CCI API - auto-allocated VLAN ID in Status? @tnqn

wenqiq added 3 commits June 26, 2026 09:06
When spec.vlanTrafficTag is not set, the subnetbinding controller allocates
a VLAN from [0, 4094], writes it to spec, and realizes the NSX binding map.
Used VLANs are discovered via NSX search on parent Subnet paths so bindings
created outside the Supervisor are included.

Signed-off-by: Wenqi Qiu <wenqi.qiu@broadcom.com>
Signed-off-by: Wenqi Qiu <wenqi.qiu@broadcom.com>
Signed-off-by: Wenqi Qiu <wenqi.qiu@broadcom.com>
@wenqiq wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from 77f8dcf to 5751845 Compare June 26, 2026 01:52
Signed-off-by: Wenqi Qiu <wenqi.qiu@broadcom.com>
@wenqiq wenqiq force-pushed the topic/wenqi/VLAN-auto-allocation branch from d9faf0f to 929221e Compare June 26, 2026 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants